Back

JMIR mHealth and uHealth

JMIR Publications Inc.

Preprints posted in the last 90 days, ranked by how well they match JMIR mHealth and uHealth's content profile, based on 10 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Wearable-derived physiological features for trans-diagnostic disease comparison and classification in the All of Us longitudinal real-world dataset

Huang, X.; Hsieh, C.; Nguyen, Q.; Renteria, M. E.; Gharahkhani, P.

2026-04-13 epidemiology 10.64898/2026.04.07.26350352 medRxiv
Top 0.1%
22.8%
Show abstract

Wearable-derived physiological features have been associated with disease risk, but most current studies focus on single conditions, limiting understanding of cross-disease patterns. This study adopts a trans-diagnostic approach to examine whether wearable data capture shared and condition-specific physiological signatures across multiple chronic conditions spanning physical and mental health, and then evaluates the utility of these features for disease classification. A total of 9,301 patients with at least 21 days of consecutive FitBit data from the All of Us Controlled Tier Dataset version 8 were analyzed. Disease subcohorts included cardiovascular disease (CVD), diabetes, obstructive sleep apnea (OSA), major depressive disorder (MDD), anxiety, bipolar disorder, and attention-deficit/ hyperactivity disorder (ADHD), chosen based on prevalence and relevance. Logistic regression and XGBoost models were fitted for each disease subcohort versus the control cohort. We found that compared to using just baseline demographic and lifestyle features, incorporating wearable-derived features enabled improved classification performance in all subcohorts for both models, except for ADHD where improvement was mainly observed for ROC-AUC in logistic regression model likely due to the smaller sample size in ADHD subcohort. The largest performance gains were observed in MDD (increase in ROC-AUC of 0.077 for Logistic regression, 0.071 for XGBoost; p < 0.001) and anxiety (increase in ROC-AUC of 0.077 for logistic regression, 0.108 for XGBoost; p < 0.001). This study provides one of the first comprehensive transdiagnostic evaluations of wearable-derived features for disease classification, highlighting their potential to enhance risk stratification in the real-world setting as a practical complement to clinical assessments and providing a foundation to explore more fine-grained wearable data. Author summaryWearable devices such as fitness trackers and smartwatches are becoming increasingly popular and affordable, providing continuous measurements of heart rate, physical activity, and sleep. Alongside the growing digitization of health records, this creates new opportunities for large-scale, real-world health studies. In this study, we analyzed wearable-derived physiological patterns across a range of chronic conditions spanning both physical and mental health to better understand how these signals relate to disease risk. We found that incorporating wearable-derived heart rate, activity and sleep features improved disease risk classification across several conditions, with particularly strong gains for major depressive disorder and anxiety. By examining how individual features contributed to model predictions, we also identified meaningful associations between physiological signals and disease risk. For example, both duration and day-to-day variation of deep and rapid eye movement (REM) sleep were associated with increased risk in certain conditions. Our study supports the development of real-time, automated tools to assess disease risk alongside clinical care.

2
Are different consumer sleep technologies measuring the same essential aspects of sleep?

G Ravindran, K. K.; della Monica, C.; Atzori, G.; M Pineda, M.; Nilforooshan, R.; Hassanin, H.; Revell, V. L.; Dijk, D.-J.

2026-04-01 public and global health 10.64898/2026.03.31.26349815 medRxiv
Top 0.1%
17.4%
Show abstract

Study objectives Consumer sleep technologies (CSTs) enable low-burden longitudinal sleep monitoring, and their output measures are often interpreted as equivalent to polysomnography (PSG) measures. We applied a measurement reliability-aware approach to determine whether CST-derived 'sleep' measures (1) are interchangeable or device-specific, (2) can reliably assess trait-like sleep characteristics of an individual, (3) can be reduced to latent principal components of sleep, and (4) can be used for classification and biomarker discovery. Methods Data from 74 older adults (20 people living with dementia [PLWD]) were collected at-home (upto 14 nights; Total=752nights) using four tools simultaneously: research-grade actigraphy (Axivity), a wearable (Withings Watch), a nearable (Withings Sleep Analyzer) and Sleep Diary, followed by one in-lab PSG assessment. We used repeated-measures correlation analyses, intraclass correlation coefficients (ICC), principal component analysis (PCA) and binary classification models to address our objectives. Results Single-night between-device correlations and correlations with PSG were moderate (0.3[&le;]r<0.7) for some duration- and timing-related measures, but other associations were weak (r<0.3). Seventy-one percent of sleep measures reached acceptable reliability (ICC[&ge;]0.7) within seven nights of aggregation, but the required aggregation window varied across measures, tools and between PLWD and Controls. Reliability-filtered PCA yielded stable and interpretable principal components, but Duration was the only component showing moderate between-device association. Principal components were successfully used to classify PLWD vs Controls but feature importance varied across devices. Conclusions Aggregation of CST derived measures across 7-14 nights, yielded reliable measures, most of which were device-specific, with duration being the only essential aspect transferable between devices.

3
Severity of Depression and Anxiety Symptoms Manifest in Physiological and Behavioral Metrics Collected from a Consumer-Grade Wearable Ring

Sameh, A.; Azadifar, S.; Nauha, L.; Karmeniemi, M.; Niemela, M.; Farrahi, V.

2026-02-09 health informatics 10.64898/2026.02.06.26345566 medRxiv
Top 0.1%
17.2%
Show abstract

Wearable devices can collect changes in human behaviors related to mental health including depression and anxiety. Here, we examined whether and how digital metrics from a consumer-grade wearable smart ring (Oura Ring) differed by severity of depression and anxiety symptoms using data from a large-scale population-based sample of young adults (n=1,290, age range: 33-35). Participants wore the ring for two weeks, assessing sleep architecture, nocturnal heart rate (HR), heart rate variability (HRV), and movement intensity. Mental health symptoms were assessed using the Generalized Anxiety Disorder 7-item and Hopkins Symptom Checklist-25 scales. On average, participants with higher depression and/or anxiety symptoms had lower levels of rapid eye movement and had higher levels of deep and light sleep, elevated nocturnal HR, reduced HRV, and lower daytime movement compared to non-symptom individuals. Findings suggest that symptoms of depression and anxiety may manifest in physiological and behavioral metrics collected by consumer-grade wearable devices.

4
Interactive Physical Activity Apps: Do the ABACUS and the MARS Measure Up? A Descriptive Analysis of Behaviour Change Taxonomies

Ori, E. M.; Baay, C.; Ester, M.; Toohey, A. M.

2026-02-22 public and global health 10.64898/2026.02.18.26346599 medRxiv
Top 0.1%
12.4%
Show abstract

The ubiquitous use of digital tools may be beneficial for improving physical activity across diverse populations. It remains unknown however, how publicly available, cost-free physical activity apps adhere to behaviour change techniques, and how users rate these apps. To explore the number of publicly available physical activity apps and relationships among behaviour science techniques, subjective quality, and user ratings. Exploratory content analysis of 17 apps meeting inclusion criteria. The App Behaviour Change Scale (ABACUS) and Mobile App Rating Scale (MARS) were used to code each downloaded app for behaviour change techniques, app functionality, and subjective quality. App store user ratings were also collected along with descriptive data about each app. All apps were commercially affiliated, targeted adult populations, and centered on changing behaviour, setting goals, and addressing physical health. No apps addressed all 21 ABACUS items; apps included 12.8 {+/-} 2.4 indicators, ranging from 8-18 indicators. The three most common ABACUS indicators were: i) collection of baseline information, ii) instructional PA content, and iii) ability for app to give user feedback. The three least common ABACUS indicators were: i) ability to export data, ii) consequences for physical activity dis/continuance, and iii) allows for planning of barriers. No apps included all 12 MARS focus areas; 94.1% of apps allowed goal setting, 58.8% addressed physical health, and 41.2% included a mindfulness focus. Linear regressions explored relationships for app user ratings; aggregated MARS domains accounted for 54% of the variance. Publicly available physical activity apps may be a useful approach to improving physical activity uptake and adherence among harder-to-reach populations including low socioeconomic status groups. App developers should consider incorporating more behaviour change techniques within cost-free apps to improve user uptake and ultimately improve physical activity associated health outcomes. Author SummaryDigital technology proliferates all facets of life and populations, and may contribute to improved health behaviours including physical activity. However, access to supportive technology may be limited by cost for example, as many popular physical activity apps require paid subscriptions. It is unknown whether cost-free physical activity apps adhere to behaviour change recommendations and how these apps are rated by users. This research explored cost-free, publicly available physical activity apps and their respective relationships with behaviour change techniques as well as app-store user ratings. Only 17 apps met inclusion criteria, and were compared against one behaviour change scale and one app quality scale. All apps had commercial motivations and focused on physical activity for adult populations. Most commonly, apps collected user info at baseline, provided physical activity instructional content, and provided feedback to users. Apps were generally rated positively by users based on app-store star ratings. Cost-free physical activity apps may be useful tools for users looking to improve physical activity for individuals who are limited by their socioeconomic situation. However, greater emphasis on evidence-based behaviour change approaches may be necessary to improve health outcomes for users.

5
Hidden in the Night: Wearable Sleep Assessment of Nocturnal Hypoglycaemia in Type 1 Diabetes

Alsuhaymi, A.; Nutter, P. W.; Thabit, H.; Harper, S.

2026-01-28 health informatics 10.64898/2026.01.22.26344161 medRxiv
Top 0.1%
9.9%
Show abstract

BackgroundNocturnal hypoglycaemia (NH) is a common and challenging complication in Type 1 Diabetes (T1D), disrupting blood glucose control and sleep physiology. Its real-world impact on sleep architecture remains poorly characterised. Consumer wearables offer a way to examine these associations under free-living conditions, providing detailed insight into behavioural and physiological responses to nocturnal blood glucose fluctuations. This study aims to assess how wearable-derived sleep metrics and physiological features could be used as indicators of NH, including the effects of how low blood glucose levels fall during hypoglycaemic events and the associated pre-event changes. MethodsWe conducted a comparative observational analysis of paired continuous glucose monitoring (CGM) and Garmin smartwatch data collected over 12 weeks from 17 adults with T1D. Nights were categorised as normoglycaemia, hyperglycaemia, or hypoglycaemia Level 1 ([&ge;]3.1 and <3.9 mmol/L), and hypoglycaemia Level 2 (<3.0 mmol/L). Thirteen sleep metrics, including total sleep time, wake after sleep onset (WASO), sleep-stage proportions, fragmentation indices, and physiological features such as heart rate, were compared using non-parametric tests. Pre-hypoglycaemic event analyses examined 60-minute and 15-minute windows preceding hypoglycaemia to identify early deviations in sleep and physiological metrics. ResultsAcross 573 nights, 17.5% involved Level 1 and 7.3% Level 2 hypoglycaemia. Level 2 hypoglycaemia was associated with 31 minutes less wakefulness, 17-25 minutes more REM, and up to 74% more deep sleep compared with normo-glycaemic nights. Sleep efficiency increased during hypoglycaemic events despite greater fragmentation. Pre-hypoglycaemic episode analyses revealed shorter awake and light-sleep bouts, as well as a 9.8% higher heart rate, preceding Level 2 episodes. ConclusionsWearable-derived sleep and physiological signals reveal clear intraindividual changes both before and during NH. Our findings indicate that Level 2 episodes are associated with deeper sleep and reduced behavioural arousal, suggesting that CGM alarms may be less effective at waking individuals during level2 NH. By characterising pre-hypoglycaemic changes that differ based on hypoglycaemia level, this work provides preliminary evidence for personalised, wearable-based early-warning systems. Such approaches could help distinguish nocturnal hypoglycaemic events and support more effective alerting, particularly in settings with limited or no access to CGM. Author SummaryO_ST_ABSWhy was this study done?C_ST_ABSPeople with Type 1 Diabetes (T1D) frequently experience nocturnal hypoglycaemia (low blood glucose at night), a dangerous event that often goes unnoticed because individuals are less able to recognise symptoms or wake up during sleep. These events also disrupt sleep in ways that are not well characterised under real-world conditions. Limited access to continuous glucose monitoring (CGM), especially in low- and middle-income countries, highlights the need for affordable alternatives to ensure nighttime safety. What did we do and find?Using more than 500 nights of paired smartwatch and CGM data, we investigated how sleep features change when blood glucose levels fall overnight. We found that hypoglycaemic nights show distinct alterations in sleep architecture, including increased REM and deep sleep, and greater micro-fragmentation. A key finding was that Level 2 hypoglycaemia was associated with deeper sleep and reduced wakefulness. This pattern indicates that individuals may be less likely to awaken during more severe events, even when alarms are present. Pre-hypoglycaemic episode analysis revealed additional early-warning signals, such as shorter awake and light-sleep bouts and elevated heart rate, before level 2 hypoglycaemia occurred. What do these findings mean?Smartwatches can capture sleep-based changes that appear before and during nocturnal hypoglycaemia. Because deeper sleep during Level 2 episodes may reduce responsiveness to CGM alerts, these results suggest that current alarm approaches could be improved by incorporating sleep features alongside glucose data. Such sleep-informed detection may enhance the reliability of hypoglycaemia alerts, reduce missed events during deep sleep, and provide a foundation for low-cost early-warning systems in settings where CGM is unavailable or unaffordable. Further research is needed in larger and more diverse populations, but this work provides early evidence that wearable-derived sleep features can meaningfully strengthen nocturnal hypoglycaemia detection.

6
Machine Learning Analysis of User Sentiments in Tinnitus Management Apps

Yousaf, M. N.; Anwar, M. N.; Naveed, N.; Haider, U.

2026-02-22 health informatics 10.64898/2026.02.19.26346680 medRxiv
Top 0.1%
9.4%
Show abstract

BackgroundTinnitus affects a substantial proportion of the global population and can severely disrupt sleep, mood, and daily functioning, yet the quality of mobile health apps designed for tinnitus management remains highly variable. Traditional evaluation methods, including clinical trials, expert rating scales, and small-scale surveys, rarely capture large-scale, feature-level feedback from real-world users, leaving a gap in understanding which app characteristics drive sustained engagement and satisfaction. MethodsThis study analysed 342,520 English-language reviews from 84 tinnitus-related apps on iOS and Android collected between 2015 and 2025. A pipeline first applied VADER-based preprocessing and sentiment assignment, then trained a graph neural network aspect-based sentiment analysis (GNN-ABSA) model operating on sentence-level dependency graphs to infer feature-level sentiment for domains such as sound therapy, sleep support, pricing, advertisements, stability, and user interface. ResultsThe GNN-ABSA model achieved an accuracy of 84.4% and a macro F1 score of 0.829 on unseen aspect-level test data, indicating stable performance across sentiment classes. Therapeutic features like sound masking and sleep support were associated with predominantly positive sentiment, whereas pricing, advertisements, background playback, and technical stability attracted more neutral or negative feedback over the ten-year period. ConclusionsLarge-scale, graph-based feature-level sentiment analysis provides a user-cantered perspective that complements clinical trials and expert app quality ratings, offering actionable guidance for developers seeking to prioritize design improvements, supporting clinicians in recommending suitable apps to patients, and informing the design of more explainable and user-driven digital health tools. Trial RegistrationNot applicable. This study analysed publicly available app store reviews and did not involve human participants.

7
Wearable-derived cardiovascular fitness age and its lifestyle correlates in 442 adults

Shanmugam, A.; Gupta, K.; Dhawale, N.; Singhal, V.; Kumar, M.; Srinivasan, B.; Narasimhan, V.

2026-03-25 health informatics 10.64898/2026.03.20.26348891 medRxiv
Top 0.1%
8.5%
Show abstract

Cardiovascular age is a powerful risk-communication tool that translates complex physiological data into an intuitive number, yet traditional estimates require clinical testing. Consumer wearables now estimate cardiorespiratory fitness age from photoplethysmography-derived heart rate data, enabling continuous, passive health monitoring, but whether such estimates capture substantive lifestyle variation has not been examined. We characterized Cardio Age, a wearable-derived cardiorespiratory fitness age estimate, in 442 Ultrahuman Ring users across a 12-month window ending February 2026, separating independent lifestyle correlates from direct or indirect algorithmic inputs. The mean Cardio Age gap (CA gap; mean Cardio Age minus chronological age) was -1.84{+/-}2.97 years, with 82.6% of participants exhibiting younger estimated cardiovascular ages. Independent lifestyle metrics with no algorithmic link to Cardio Age showed significant associations: sleep efficiency (r = -0.194, p < 0.001), rapid eye movement (REM) sleep (r = -0.203, p < 0.001), sleep duration (r = -0.200, p < 0.001), and daily steps (r = -0.145, p = 0.003). A monotonic body mass index (BMI) dose-response was observed, with underweight participants showing a mean CA gap of -3.73 years versus -0.52 for obese participants. Extreme-group comparisons revealed that users with the youngest cardiovascular ages slept 37 minutes longer, achieved 22 more minutes of REM sleep, and had 1.8% higher sleep efficiency than those with the oldest cardiovascular ages (all p < 0.05). Sustained improvers over 12 months showed a mean CA reduction of 3.24 years, accompanied by decreased resting heart rate (-0.8 bpm, p < 0.001) and increased estimated VO2 max (+1.3 mL/kg/min, p < 0.001), indicating that Cardio Age tracks physiological changes over time.

8
11 million days of longitudinal wearable data reveal novel future health insights

Fulda, E. S.; Waxse, B. J.; Goleva, S. B.; Tran, T. C.; Taylor, H. J.; Bailey, C. P.; Wolff-Hughes, D. L.; Mo, H.; Zeng, C.; Keaton, J. M.; Ferrara, T. M.; Topiwala, A.; Doherty, A.; Denny, J. C.

2026-01-30 epidemiology 10.64898/2026.01.29.26344899 medRxiv
Top 0.1%
6.9%
Show abstract

BackgroundInsufficient physical activity (PA) is associated with higher risk of morbidity and premature mortality. Wearable devices offer a scalable, objective measurement of physical activity, but most studies reduce these data to a single activity metric measured over a fixed 7-day period. We compared different wearable-derived phenotyping approaches to understand their impact on activity-disease associations. MethodsWe analyzed 11 million days of Fitbit data from 29,351 participants in the All of Us Research Program, deriving four daily activity metrics (step count, peak 1-min cadence, peak 30-min cadence, and heart rate per step) across five time-windows (1-day, 1-week, 1-month, 6-months, 1-year). We performed phenome-wide analyses on >700 incident and >1,300 prevalent disease outcomes identified from linked electronic health records. FindingsAmong participants with EHR and Fitbit data (mean age 57.3 years, 69% female, 47% with >1 year of Fitbit data), all 20 phenotypes were highly correlated (median Pearson r = 0.71). Longer measurement windows yielded stronger and more stable associations, with 1-year step count associated with 373 prevalent and 37 incident outcomes (versus 231 and 17 for 1-day step count) after Bonferroni-correction, including novel associations with chronic pain syndrome, SARS-CoV-2, and autoimmune disease. Differences between prevalent and incident associations suggest that activity metrics can act as both early markers of disease or risk factors. InterpretationThese findings highlight how large-scale, longitudinal wearable data can advance understanding of health and disease and inform scalable approaches for clinical risk stratification. FundingNational Institutes of Health Intramural Research Program, Wellcome Trust RESEARCH IN CONTEXTO_ST_ABSEvidence before this studyC_ST_ABSLow levels of physical activity relate to numerous health outcomes. However, prior studies are limited by a focus on disease prevalence and by a lack of examination across a broad range of health outcomes. Further, the strength of these associations, depends on how physical activity is measured. Prior work shows that wearable devices capture activity more reliably than self-report surveys and typically yield stronger associations with disease risk. Most wearable-based studies rely on short monitoring windows: often seven days or fewer. To our knowledge, no study has systematically evaluated how the duration of wearable-based phenotyping influences estimates of disease risk. To explore this, we searched PubMed using the terms "wearable phenotyping" AND "disease risk", resulting in 48 articles published between 2016 and 2025. Although some studies compared different wearable-derived phenotypes (e.g., step count vs. sleep duration) or explored how the number of observed days affects data quality, none directly evaluated how the length of the phenotyping period shapes associations with disease risk. Added value of this studyUsing nearly 11 million person-days of Fitbit data from [~]30,000 participants, this study evaluates how four wearable-derived activity metrics, summarized across five time windows, influence estimates of activity-disease associations. We identified over 300 previously unreported associations for any of our four metrics and various health outcomes. Longer phenotyping windows consistently yielded stronger associations than shorter ones, although all windows remained informative. These findings highlight the importance of extended wearable monitoring for robust risk characterization. We further compared incident cases with both prevalent and incident outcomes, illustrating the roles of physical activity as a potentially modifiable risk factor, and an early marker of disease. Implications of all the available evidenceThese findings have two important implications. First, longer periods of wearable data collection improve the accuracy of disease risk estimation and should be considered in the design of epidemiologic studies and in the development of clinical guidelines. Although associations between physical activity and disease were directionally consistent across all time windows, effect sizes varied substantially, an observation with important consequences for public health recommendations. Second, this study represents one of the first large-scale demonstrations of long-term wearable monitoring for real-world risk stratification, marking an important advance toward individualized health assessment and intervention.

9
Engagement With a Breath-Based Metabolic Device Is Associated with Greater Weight Loss in Self-Reported Real-World GLP-1RA Users

Ben David, G.; Udasin, R.; Golan, D.; Mor, M.; Mor, M.

2026-02-24 endocrinology 10.64898/2026.02.22.26346841 medRxiv
Top 0.1%
6.9%
Show abstract

BackgroundDigital health self-monitoring tools are widely used to support weight management and metabolic health. Higher engagement with these tools is often associated with better clinical outcomes; however, real-world engagement-outcome relationships for consumer metabolic monitoring devices remain incompletely characterized, particularly in heterogeneous user populations. ObjectiveTo evaluate whether engagement with a portable breath-based metabolic device (Lumen; Metaflow Ltd.) is associated with greater weight loss and reduction in body fat among real-world glucagon-like peptide-1 receptor agonist (GLP-1RA) users. The study also explores correlations between engagement and a device-specific measure of metabolic flexibility (FLEX score). MethodsWe retrospectively analyzed 2,296 adult Lumen users who self-reported GLP-1RA use over 24 weeks. Engagement was quantified as total engagement days over a 24-week period and ordered engagement consistency groups defined by weekly use frequency thresholds. Weight and body fat percentage data were collected by a combination of connected devices and manual user input in the Lumen smartphone application. Associations with weight loss and reduction in body fat percentage were evaluated using linear regression and ANCOVA adjusted for age, baseline BMI, and sex, with HC3 robust standard errors. Body fat percentage data were available for only 490 of the 2,296 subjects. In addition, similar associations were evaluated for FLEX score. GLP-1RA exposure was self-reported at onboarding and not verified longitudinally. ResultsAt 24 weeks, low/medium/high engagement users lost 3.2%, 4.6%, and 5.2% of body weight (trend p=2.36x10-11). Engagement days were associated with percent weight change (slope -0.0214% per day; P(HC3)=7.9x10- 18). Engagement days showed modest association with body fat percentage change (n=490; slope -0.0105% per day; P(HC3)=.010). The adjusted ANCOVA trend across engagement groups was not significant (P=.19). Engagement days and consistency both showed a highly significant trend in increase in FLEX score (slope +0.0185 per day; P(HC3)=2.0x10- 36). ConclusionsIn a real-world digital health dataset, higher engagement with a breath-based metabolic monitoring device and its smartphone application was associated with greater 24-week weight loss after adjustment for age, baseline BMI, and sex. The absolute difference between low and high engagement (2.0% body weight) is modest but clinically meaningful in real-world settings after 24 weeks of tracking. Associations with body fat percentage change were smaller and not consistently significant in adjusted analyses. Associations with metabolic flexibility were highly significant, but it remains unknown whether this parameter is predictive or reflective. Prospective controlled studies are needed to test causality and determine whether device-driven biofeedback and sustained engagement independently improve outcomes because GLP-1RA use was self-reported and unverified, and the present analysis was observational. These findings should be interpreted as engagement-outcome associations and reflect behavioral motivation and adherence rather than evidence of device efficacy.

10
Design and Rationale of the My Heart Counts Cardiovascular Health Study: a Large-Scale, Fully Digital Biobank, and Randomized Trial of Large Language Model-Driven Coaching of Physical Activity

Schmiedmayer, P.; Johnson, A.; Schuetz, N.; Kollmer, L.; Goldschmidt, P.; Delgado-SanMartin, J.; Zhang, K.; Mantena, S. D.; Tolas, A.; Montalvo, S.; Raimrez Posada, M.; O'Sullivan, J. W.; Oppezzo, M.; King, A. C.; Rodriguez, F.; Ashley, E.; Lawrie, A.; Kim, D. S.

2026-03-03 cardiovascular medicine 10.64898/2026.03.02.26347447 medRxiv
Top 0.1%
6.8%
Show abstract

BackgroundCardiovascular disease remains the leading cause of global morbidity and mortality. The original My Heart Counts smartphone application demonstrated the feasibility of large-scale, fully digital recruitment and trial conduct, but was limited by platform exclusivity and the need for human experts to create text-based behavioral interventions. MethodsThe next-generation My Heart Counts smartphone application is a prospective, observational cohort study with an embedded randomized crossover trial, evaluating personalized text-based coaching prompts, available in both English and Spanish. All study and trial operations will be conducted via the My Heart Counts smartphone application, re-designed using the open-source Stanford Spezi framework to support iOS, with a planned Android release in 2027. The target enrollment is N=15,000 adults across the United States and United Kingdom. The study establishes a comprehensive digital biobank by synthesizing passive mobile health data (steps, flights climbed, heart rate, sleep, workouts), raw sensor data (e.g., accelerometry), longitudinal clinical surveys, active tasks (6-minute walk test and 12-minute Cooper run test), electrocardiograms (ECG), and electronic health record (EHR) data integrated via HL7 FHIR protocols. The embedded trial evaluates the effect of text-based coaching prompts generated by a large language model (LLM) grounded in the Transtheoretical Model of Change on daily physical activity, as compared to generic prompts. Planned AnalysisThe primary endpoint of the randomized crossover trial is change in daily step count between LLM-driven and generic text-based intervention arms, analyzed using mixed-effects models. Secondary endpoints include change in mean active minutes and calorie burn over each intervention week. Other analyses include the changes in submaximal (6-minute walk test) and maximal (Cooper 12-minute run test) cardiorespiratory fitness, changes to sensor-derived biomarkers (e.g., sleep quality, resting heart rate, and heart rate variability), and association of sensor-derived biomarkers with EHR-confirmed clinical outcomes. ConclusionsBy utilizing autonomous, LLM-driven coaching, modular software design, and cross-platform accessibility, our smartphone application-based study will provide a scalable model for inclusive and decentralized preventive care of patients with cardiovascular disease. Trial StatusRecruitment commenced in March 2026 and is ongoing.

11
Physical activity buffers physiological stress during high emotional distress: a wearable-derived prospective cohort study

Pinkerton, C.; Guo, Y.; Qu, A.

2026-04-06 public and global health 10.64898/2026.04.05.26350215 medRxiv
Top 0.1%
6.4%
Show abstract

Background: Digital phenotyping using wearable devices and ecological momentary assessment (EMA) enables continuous, real-world monitoring of physiological and emotional states, but identifying high-risk stress states in real time remains challenging. We examined day-level associations between emotional distress and heart rate variability (HRV), and assessed whether daily physical activity modifies this relationship using longitudinal wearable and EMA data. Methods: The Smart Momentary Interactive Longitudinal Evaluation Study (SMILES) was a prospective cohort study conducted among STEM graduate students in the U.S. in 2025. Participants wore an Oura Ring Generation 3 continuously for five months and completed daily EMA surveys assessing emotional distress. The primary outcome was nightly HRV measured as the root mean square of successive differences and log-transformed for analysis. Quantile regression within a quadratic inference function framework was used to estimate associations at the 25th, 50th, and 75th percentiles of HRV, accounting for within-participant correlation and time-varying covariates. Findings: Thirty-one participants contributed 1,724 person-days of observation. High emotional distress was associated with lower HRV across the HRV distribution, with the strongest association observed at the lower HRV quantile ({beta} = -0.094, 95\% CI: [-0.111, -0.078]). A significant interaction between daily step count and emotional distress was observed across quantiles, such that higher physical activity was associated with higher HRV on high emotional distress days but not on low-to-moderate distress days. Interpretation: Integration of wearable-derived physiological data with EMA enables real-time identification of high-risk stress states in naturalistic settings. The observed buffering effect of physical activity during periods of elevated emotional distress suggests that wearable-guided, personalized just-in-time adaptive interventions, such as physical activity prompts, could be deployed to improve autonomic regulation and mental health.

12
Introducing circStudio, a Python package for preprocessing, analyzing and modeling actigraphy data

Marques, D.; Barbosa-Morais, N. L.; Reis, C. C. P.

2026-04-01 bioinformatics 10.64898/2026.03.30.711342 medRxiv
Top 0.1%
6.3%
Show abstract

Actigraphy is a non-invasive and cost-effective method for monitoring behavioral rhythms under real-world conditions by collecting time-resolved measurements of locomotor activity, light exposure, and temperature. Although several open-source packages support specific aspects of actigraphy analysis, aspects such as preprocessing, metric calculation, and mathematical modeling are often distributed across separate software packages, limiting interoperability and increasing programming overhead. Here we introduce circStudio, a Python package that unifies actigraphy data processing and mathematical modeling of circadian rhythms within a single framework. Built from the pyActigraphy codebase and integrating circadian models from the Arcascope circadian package, circStudio provides flexible preprocessing tools, support for multiple actigraphy file formats through adaptor classes, standalone functions for computing commonly used actigraphy metrics, and implementations of several mathematical models of circadian rhythms. The package enables users to move efficiently from raw wearable data to physiologically interpretable circadian outputs. Ultimately, circStudio aims to facilitate reproducible workflows and to provide a flexible foundation for research applications across circadian biology, sleep science, and digital health.

13
Wearable sleep staging using photoplethysmography and accelerometry across sleep apnea severity: a focus on very severe sleep apnea

Ogaki, S.; Kaneda, M.; Nohara, T.; Fujita, S.; Osako, N.; Yagi, T.; Tomita, Y.; Ogata, T.

2026-04-13 health informatics 10.64898/2026.04.09.26350266 medRxiv
Top 0.1%
6.3%
Show abstract

Study ObjectivesTo evaluate wearable sleep staging across sleep apnea severity, including very severe sleep apnea defined as an apnea-hypopnea index (AHI)[&ge;] 50 events/h, and to assess how training-set composition affects performance in this subgroup. MethodsWe analyzed 552 overnight recordings, 318 from the Sleep Lab Dataset and 234 from the Hospital Dataset. In the Hospital Dataset, 26.5% had very severe sleep apnea. We developed a deep learning model for sleep staging using RR intervals from wrist-worn photoplethysmography and three-axis accelerometry. Baseline performance was assessed by cross-validation under 5-stage and 4-stage staging. We examined night-level associations with AHI severity. We also compared the baseline model with an ablation model trained on the same number of recordings but with more Sleep Lab Dataset and lower-AHI Hospital Dataset recordings, evaluating both models in the very severe subgroup. ResultsIn 5-stage classification, Cohens kappa was 0.586 in the Sleep Lab Dataset and 0.446 in the Hospital Dataset. Under 4-stage staging, the gap narrowed, with kappa values of 0.632 and 0.525, respectively. In the Hospital Dataset, performance declined with increasing AHI severity. Among 62 recordings with very severe sleep apnea, reducing high-AHI representation in training lowered kappa from 0.365 to 0.303. ConclusionsWearable sleep staging performance declined across greater sleep apnea severity in this clinical cohort. Clinical utility may benefit from training data that better represent the target severity spectrum and from selecting staging granularity to match the intended use case. Statement of SignificanceRepeated laboratory polysomnography is impractical for long-term sleep apnea management. Wearable sleep staging could support scalable monitoring, yet its reliability in clinically severe sleep apnea has remained unclear. This study developed and evaluated a wearable sleep staging approach in both sleep-laboratory and hospital cohorts. The hospital cohort included many severe and very severe cases. Performance was lower in the hospital cohort and declined with greater sleep apnea severity. A coarser staging scheme reduced the gap between cohorts, and models trained without representative very severe cases performed worse in this target population. These findings highlight the value of severity-aware model development and motivate future multi-night home validation with reliability cues.

14
Reallocation of 24-hour physical behaviour composition and mortality: exploring effect modification by sleep characteristics

Bian, W.; Ahmadi, M.; Mitchell, J. J.; Biswas, R. K.; Koemel, N. A.; Dumuid, D.; Chastin, S. F.; Blodgett, J. M. F.; Chaput, J.-P.; Hamer, M.; Stamatakis, E.

2026-03-25 epidemiology 10.64898/2026.03.23.26349126 medRxiv
Top 0.1%
6.2%
Show abstract

Time compositions of physical behaviours are associated with premature mortality, but the moderating role of sleep remains unclear. Using data from the UK Biobank accelerometry subsample, we examined associations of time reallocations between five device-measured physical behaviours (sleep, sedentary behaviour (SB), standing, light-intensity (LPA) and moderate-to-vigorous physical activity (MVPA)) with all-cause, cardiovascular disease (CVD) and physical activity-related cancer mortality, and the potential effect modification by sleep duration and regularity. Compositional Cox regression was used to examine associations of behavioural reallocations with mortality. In 58,149 adults, 2,209 deaths occurred over a mean follow-up of 8.0 years. Among participants who meet sleep duration guidelines, reallocating 30 minutes from sleep to standing, LPA or MVPA was favourably associated with all-cause mortality with HRs of 0.86 (95%CI 0.79, 0.93), 0.87 (0.80, 0.95), and 0.80 (0.73, 0.87), respectively. Reallocating 30 minutes from sleep to SB, standing, or LPA was adversely associated with CVD risk (HRs 1.08 (1.02, 1.15), 1.10 (1.01, 1.20), and 1.11 (1.03, 1.20)) among those not meeting guidelines. Beneficial associations of reallocating SB to sleep were evident only amongst short (<7h/day) or regular (SRI>87.8) sleepers across mortality outcomes. Our findings support incorporating sleep characteristics into future personalised behavioural interventions design and behavioural targets.

15
Proposing the Roommate Sleep Preference Questionnaire (ROOMPREF) with a free online roommate matching tool

Driller, M. W.; Suppiah, H.

2026-03-09 sports medicine 10.64898/2026.03.02.26347046 medRxiv
Top 0.1%
5.0%
Show abstract

Shared sleeping quarters are commonplace in contexts such as athletes at major sporting events, academic dormitories, and military barracks, yet mismatched sleep preferences can undermine rest and ultimately, human behaviour and performance. We introduce the Roommate Sleep Preference Questionnaire (ROOMPREF), a brief eight-question survey capturing preferences for noise, lighting, and temperature tolerances, snoring behaviour, and chronotype. Responses feed into a free, web-based clustering tool built in Python, which flags preference conflicts, and implements adaptive K-Means clustering within sex-chronotype subgroups. A post-cluster swapping algorithm further mitigates residual mismatches, enhancing the room-matching process. The resource includes distribution charts, group summaries, and optional automated room allocations, with downloadable CSV outputs. We demonstrate its application in a pilot cohort, highlighting its potential to improve sleep outcomes across various use-cases. This free resource has the potential to alleviate mismatched rooming partners, resulting in enhanced sleep and wellbeing outcomes.

16
Optimizing Temporal Windows for Wearable-Augmented Post-Discharge Risk Prediction: A Methods Study

Bressman, E.; Park, S.-H.; Greysen, S. R.; Chen, J.

2026-01-23 health informatics 10.64898/2026.01.21.26344487 medRxiv
Top 0.1%
5.0%
Show abstract

ObjectiveTo identify optimal modeling parameters for dynamically predicting hospital readmission risk using post-discharge step-count data from remote monitoring devices. MethodsWe combined data from two clinical studies that collected wearable or smartphone-based activity data for up to 6 months after hospital discharge. Analyses were limited to older adults ([&ge;]55 years). We constructed a patient-day dataset incorporating static demographic and clinical variables and dynamic activity features aggregated over retrospective windows of 3, 5, 7, or 10 days. Models predicted a composite outcome of readmission or death over prospective horizons of 3, 5, 7, or 10 days, within follow-up periods of 30-180 days. Logistic regression and LightGBM models were trained using 5-fold cross-validation on an 80:20 patient-level split. ResultsAmong 215 participants, LightGBM outperformed logistic regression across all configurations (mean AUC 0.82 vs 0.76). Performance improved with longer prospective horizons but was largely insensitive to retrospective window length. The LightGBM model was well-calibrated (Hosmer-Lemeshow {chi}2 = 2.46, p = 0.96), whereas logistic regression showed miscalibration ({chi}2 = 51.8, p < 0.001). In feature-importance analyses, LightGBM ranked static (length of stay, vitals, BMI) and dynamic (recent steps, distance) features highly, whereas logistic regression emphasized activity-based variables. DiscussionPrediction performance was impacted by horizon length and training window, with minimal effect of retrospective window. LightGBM achieved higher discrimination and better calibration, supporting flexible, non-parametric methods for post-discharge risk prediction. ConclusionPost-discharge activity data enhance readmission-risk prediction. Selecting practical temporal windows and appropriate model types can improve accuracy and calibration in wearable-augmented risk models.

17
Apnea-hypopnea index estimation with wrist-worn photoplethysmography

Fonseca, P.; Ross, M.; van Meulen, F.; Asin, J.; van Gilst, M. M.; Overeem, S.

2026-04-11 health informatics 10.64898/2026.04.08.26350411 medRxiv
Top 0.1%
4.8%
Show abstract

ObjectiveLong term monitoring of obstructive sleep apnea (OSA) severity may be relevant for several clinical applications. We developed a method for estimating the apnea-hypopnea index (AHI) using wrist-worn, reflective photoplethysmography (PPG). ApproachA neural network was developed to detect respiratory events using PPG and PPG-derived sleep stages as input. The development database encompassed retrospective data from three polysomnographic datasets (N=3111), including a dataset with concurrent reflective PPG recordings from a wrist-worn device (N=969). The model was pre-trained with (transmissive) finger-PPG signals from all overnight recordings and then fine-tuned to wrist-PPG characteristics using transfer learning. Validation was performed on the test portion of the development set and on a fourth, external hold-out dataset containing both wrist-PPG and PSG data (N=171). Performance was evaluated in terms of AHI estimation accuracy and OSA severity classification. Main ResultsThe fine-tuned wrist-PPG model demonstrated strong agreement with the PSG-derived gold-standard AHI, achieving intra-class correlation coefficients of 0.87 in the test portion of the development set and 0.91 in the external hold-out validation set. Diagnostic performance was high, with accuracies above 80% for all severity thresholds. SignificanceThe study highlights the potential of reflective PPG-based AHI estimation, achieving high estimation performance in comparison with PSG. These measurements can be performed with relatively comfortable sensors integrated in convenient wrist-worn wearables, enabling long-term assessment of sleep disordered breathing, both in a diagnostic phase, and during therapy follow-up.

18
Making sleep behaviors interpretable: adapting the two-process model of sleep regulation to longitudinal Fitbit sleep and activity behaviors for health insights

Coleman, P.; Annis, J.; Master, H.; Gustavson, D. E.; Han, L.; Brittain, E.; Ruderfer, D. M.

2026-03-03 health informatics 10.64898/2026.03.01.26347356 medRxiv
Top 0.1%
4.4%
Show abstract

BackgroundAs sleep data from wearable devices are increasingly available in health research, there are new opportunities to understand sleep regulation behaviors as modifiable risk factors for disease. At such a large scale (tens of thousands of people over millions of day-level observations), prioritizing and interpreting sleep behaviors is challenging while maintaining biological relevance and modifiability. In this work, we aim to address this challenge by proposing a framework to interpret Fitbit data through a well-known neurobiological framing of sleep regulation, the two-process model. MethodsWe use data from the All of Us Research Program, a national biobank with passively collected Fitbit data for 32,292 people across 15,754,893 total days. We map Fitbit behaviors (b) to either circadian (C) or homeostatic (S) processes. Using iterative exploratory factor analysis to obtain weights, the Fitbit Cb and Sb are then weighted at the level of each day to create Cb and Sb scores. FindingsCb and Sb scores were found to align with expected real-world relationships with age, seasonality, shift work, and napping. Cb and Sb scores were interpreted with relation to depression, where it was found that Sb scores are highly associated with likelihood of diagnosis (OR = 1.5, p < 2e-16) while Cb and Sb scores are equally associated with severity (Sb score {beta} = 0.2, Cb score {beta} = 0.21, p < 2e-16). InterpretationCb and Sb scores support longitudinal interpretation (e.g., changes in Sb around treatment), aggregation (e.g., differences in Cb between two groups), and actionable modification (e.g., reduce naps to improve poor Sb). Overall, our behavior scores allow for interpretation of wearables sleep data and can be utilized across many disease contexts to better understand how sleep influences health. FundingThis work was supported by NIH training grant T32GM145734 and NIH R21HL172038.

19
The React & Rebound Model: Capturing Emotion Regulation Dynamics from Passive Wearable Data

Heusser, A. C.; Simon, T. J.; Elliot, E.; James, C.; Gazzaley, A.; Gibson, N.

2026-03-10 neuroscience 10.64898/2026.03.07.710099 medRxiv
Top 0.1%
4.3%
Show abstract

BackgroundEmotion regulation--the ability to respond to and restore equilibrium after emotional perturbations--is central to mental health. Yet objective measurement remains limited to lab-based studies with group-level results, while consumer wearables focus on physical activity-related metrics rather than emotional dynamics. ObjectiveWe aimed to develop computational models that extract personalized, interpretable emotion regulation parameters from continuous heart rate variability (HRV) data collected via consumer wearables during everyday life, and validate these parameters against self-reported anxiety symptoms. MethodsWe analyzed 4 weeks of continuous HRV data from N = 49 healthy adults wearing Samsung Galaxy Active 2 smartwatches. We derived a continuous autonomic balance signal and developed three computational modeling approaches of increasing sophistication: (1) a static sympathetic load metric, (2) an Ornstein-Uhlenbeck (OU) dynamical systems model capturing continuous restoration dynamics, and (3) a discrete-state Markov transition model--the React & Rebound model-- capturing reactivity and rebound dynamics. All models were estimated using joint hierarchical Bayesian models that simultaneously extract subject-specific parameters from HRV time series and estimate their association with Generalized Anxiety Disorder 7-item scale (GAD-7) scores. The validity of extracted parameters was evaluated against anxiety symptom severity. ResultsStatic sympathetic load correlated modestly with GAD-7 (r = 0.39, R2 = 0.16). The OU model captured 69% of variance (R2 = 0.69), and the React & Rebound model captured 60% (R2 = 0.60) with substantially fewer parameters. Both models revealed that anxiety symptom severity is associated with the interaction between activation and restoration parameters--not either alone. Fast rebound appeared protective even for highly reactive individuals, who scored comparably to low-reactivity groups when restoration was rapid (Cohens d = 1.17 between highest- and lowest-risk quadrants). In the OU model, the interaction effect was specific to GAD-7 scores versus PHQ-9 and ISI scores; in the React & Rebound model, the interaction was credible across all three symptom measures. Both models were unchanged after controlling for physical activity ({Delta}R2 < 0.002). ConclusionsComputational models can extract interpretable emotion regulation parameters from naturalistic wearable data. The React & Rebound model yields two personalized parameters--reactivity and rebound--that are strongly associated with anxiety symptoms and define meaningful autonomic profiles. These parameters bridge autonomic dynamics measurable via consumer devices to neural circuit models of emotion regulation, with implications for characterizing individual autonomic profiles via consumer wearables.

20
Conversational, Longitudinal, Ecological Assessment (CLEA): Exploring a new AI-driven method for qualitative data collection in a behavioural health context

Downes, S.; Krys, T.; O'Hara, K.; Western, M.; Thompson, L.; Brigden, A.

2026-01-23 health informatics 10.64898/2026.01.20.26344494 medRxiv
Top 0.1%
4.0%
Show abstract

In this paper, we present conversational longitudinal ecological assessment (CLEA), a novel conversational AI-enabled method for collecting ecologically valid, temporally sensitive qualitative health data via mobile instant messaging. We report findings from an exploratory deployment of an instantiation of CLEA within a 12-week community-based weight management programme, delivered by a charity partner in an area of deprivation. Using WhatsApp, we deployed our CLEA chat-agent to conduct twice-weekly conversational data collection sessions with participants, to elicit data about their experience of the programme and associated behaviour change. This was followed by in-person semi-structured interviews (N = 9) to examine user experiences and perceptions of interacting with the chat-agent. Participants reported that WhatsApps familiarity supported accessibility and sustained engagement, while the conversational format encouraged reflection directed towards the research focus. Responding to chat-agent prompts required cognitive effort, leading some participants to defer engagement until they had adequate time and mental space; however, this reflective demand was largely experienced as beneficial within the programme context. The AIs quasi-human interactional qualities fostered a sense of support while reducing social judgement, enabling more candid disclosure. Together, these findings demonstrate initial feasibility and acceptability of CLEA for longitudinal qualitative data collection in an underserved population, and illustrate its capacity to elicit meaningful, contextually grounded insights consistently over time, that can be used in the formative stage of digital health intervention development. The study highlights both the opportunities and trade-offs of conversational AI for qualitative data collection, including design implications for health researchers looking to implement or extend the method. Finally, we position CLEA in relation to other longitudinal methods of health data elicitation. Author summaryDeveloping effective interventions for health behaviours such as healthy eating and physical activity requires methods that can capture the complex, individual factors shaping peoples everyday experiences, including stress and motivation. Because such factors often fluctuate over time, longitudinal approaches are needed to understand how experiences and behaviours unfold in real-world contexts. For such methods to be effective, they must also be acceptable, engaging, and accessible--particularly for underserved or disadvantaged populations who are disproportionately affected by health-related conditions such as obesity. In this study, we introduce conversational longitudinal ecological assessment (CLEA), a digital health method that uses conversational AI technology to collect ecologically valid qualitative data over time through an accessible communication platform. We demonstrate the feasibility, acceptability, and utility of CLEA through a real-world deployment investigating an underserved groups experience of a community-based weight management programme. To support other health researchers, we position CLEA in relation to existing longitudinal methods and highlight the key design considerations that shape engagement, data quality, and participant experience.